Philippine Poverty Area Estimates¶

In [ ]:
import os
import json
import pickle
import pandas as pd
import numpy as np
import seaborn as sns
import cufflinks as cf
import chart_studio.plotly as py
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)
cf.go_offline()

Philippines: Povery Statistics Dataset¶

Based on Republic Act 8425, otherwise known as Social Reform and Poverty Alleviation Act, dated 11 December 1997, the poor refers to individuals and families whose income fall below the poverty threshold as defined by the government and/or those that cannot afford in a sustained manner to provide their basic needs of food, health, education, housing and other amenities of life.

  • Data source: United Nations Office for the Coordination of Humanitarian Affairs (UN OHCA) and Philippine Statistics Authority (PSA)
  • Data from: https://data.humdata.org/dataset/philippines-poverty-statistics
  • Geojson file from: faeldon/philippines-json-maps

Initialization¶

Loading the dataset¶

Starting with the given metadata or the catalog of the dataset

In [ ]:
df_metadata = pd.read_csv(r'data/csv/190710_poverty-statistics-metadata.csv').dropna()
df_metadata
Out[ ]:
Field Content
0 Region Code Modified PSGC code of the Region. This is a tw...
1 Region Source of the PSGC code for the Region
2 Province Code Modified PSGC code of the Province. This is a ...
3 Province Source of the PSGC code for the Province
4 City_Mun Code Modified PSGC code of the City/Municipality. T...
5 City_Municipality Source of the PSGC code for the City/Municipal...
7 Glossary of terms https://psa.gov.ph/poverty-press-releases/glos...
8 Poverty estimation methodoloy For details of poverty estimation methodology,...
9 Poverty data https://psa.gov.ph/poverty-sae-press-releases/...
10 Technical notes https://psa.gov.ph/poverty-press-releases/tech...
12 Notes: 1. "PH" in front of the PSGC coding scheme in ...

Converting the metadata dataframe to dictionary for easy access of content

In [ ]:
dict_metadata = dict(zip(df_metadata['Field'].values, df_metadata['Content'].values))
In [ ]:
print(dict_metadata['Notes:'])
1. "PH" in front of the PSGC coding scheme in order to solve the problem of the "0" that falls in several application  
2. The standard deviation of an estimate can be derived by multiplying the poverty incidence and coefficient of variation then divide by 100. 
3. The Municipality of Bumbaran, Lanao del Sur was renamed as Municipality of Amai Manabilang per Muslim Mindanao Autonomy Acto No. 316, series of 2014 and was ratified through a plebiscite on 07 April 2018.

Reading the main dataset

In [ ]:
df = pd.read_csv(r'data/csv/190710_poverty-statistics.csv')
In [ ]:
df.head()
Out[ ]:
Region Region code Province Province code Annual Per Capita Poverty Threshold \n(in Php)_2006 Annual Per Capita Poverty Threshold \n(in Php)_2009 Annual Per Capita Poverty Threshold \n(in Php)_2012 Annual Per Capita Poverty Threshold \n(in Php)_2015 Poverty Incidence among Families (%)_Est (%)_2006 Poverty Incidence among Families (%)_Est (%)_2009 Poverty Incidence among Families (%)_Est (%)_2012 Poverty Incidence among Families (%)_Est (%)_2015 Poverty Incidence among Families (%)_CV_2006 Poverty Incidence among Families (%)_CV_2009 Poverty Incidence among Families (%)_CV_2012 Poverty Incidence among Families (%)_CV_2015 Magnitude of Poor Families_Est_2006 Magnitude of Poor Families_Est_2009 Magnitude of Poor Families_Est_2012 Magnitude of Poor Families_Est_2015
0 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, CITY OF MANILA, FIRST DISTRICT (Not a Pro... PH133900000 15,699 19,227 20,344 25,007 2.7 3.2 3.6 3.5 29.6 48.7 24.8 26.7 9,906 12,405 14,343 12,710
1 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, SECOND DISTRICT (Not a Province)b/ PH137400000 15,699 19,227 20,344 25,007 3.2 2.2 1.9 1.9 48.0 21.4 25.6 40.4 28,247 21,168 19,782 21,727
2 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, THIRD DISTRICT (Not a Province) PH137500000 15,699 19,227 20,344 25,007 3.3 3.2 2.8 3.3 19.5 23.9 23.7 19.6 18,631 19,306 18,266 22,352
3 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, FOURTH DISTRICT (Not a Province)b/ PH137600000 15,699 19,227 20,344 25,007 2.4 1.5 3.0 2.8 21.8 35.0 21.5 23.5 16,570 11,095 24,138 23,457
4 CORDILLERA ADMINISTRATIVE REGION (CAR) PH140000000 ABRA PH140100000 14,680 17,852 19,775 21,240 39.4 38.9 27.2 19.9 10.8 21.0 21.2 13.2 18,054 18,852 13,914 12,400

Including only the geographical information and the magnitude of poor families estimation by year column for later on easy to digest visualization

In [ ]:
columns_to_include = [x for x in df.columns if not (x.startswith('Poverty') or x.startswith('Annual'))]
columns_to_include
Out[ ]:
['Region',
 'Region code',
 'Province',
 'Province code',
 'Magnitude of Poor Families_Est_2006',
 'Magnitude of Poor Families_Est_2009',
 'Magnitude of Poor Families_Est_2012',
 'Magnitude of Poor Families_Est_2015']

Extracting the data with the columns to include and turning it to a dataframe

In [ ]:
df = df[columns_to_include]
In [ ]:
df.head()
Out[ ]:
Region Region code Province Province code Magnitude of Poor Families_Est_2006 Magnitude of Poor Families_Est_2009 Magnitude of Poor Families_Est_2012 Magnitude of Poor Families_Est_2015
0 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, CITY OF MANILA, FIRST DISTRICT (Not a Pro... PH133900000 9,906 12,405 14,343 12,710
1 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, SECOND DISTRICT (Not a Province)b/ PH137400000 28,247 21,168 19,782 21,727
2 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, THIRD DISTRICT (Not a Province) PH137500000 18,631 19,306 18,266 22,352
3 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, FOURTH DISTRICT (Not a Province)b/ PH137600000 16,570 11,095 24,138 23,457
4 CORDILLERA ADMINISTRATIVE REGION (CAR) PH140000000 ABRA PH140100000 18,054 18,852 13,914 12,400

Dataframe cleaning¶

In [ ]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 8 columns):
 #   Column                               Non-Null Count  Dtype 
---  ------                               --------------  ----- 
 0   Region                               85 non-null     object
 1   Region code                          85 non-null     object
 2   Province                             85 non-null     object
 3   Province code                        85 non-null     object
 4   Magnitude of Poor Families_Est_2006  85 non-null     object
 5   Magnitude of Poor Families_Est_2009  85 non-null     object
 6   Magnitude of Poor Families_Est_2012  85 non-null     object
 7   Magnitude of Poor Families_Est_2015  85 non-null     object
dtypes: object(8)
memory usage: 5.4+ KB

Stripping all the columns of the all the possible leading and trailing white spaces

In [ ]:
df = df.applymap(str.strip)

Stripping the province column of the trailing 'b/' characters

In [ ]:
df['Province'] = df['Province'].str.rstrip(r'b/')

Checking the magnitude estimation columns of possible parsing problems

In [ ]:
magnitude_pf_est_by_year_column_names = [x for x in columns_to_include if x.startswith('Magnitude')]
In [ ]:
print("Columns with complication: ")
for column in magnitude_pf_est_by_year_column_names:
    try:
        pd.to_numeric(df[column].squeeze())
    except:
        print(f"\t{column}")
Columns with complication: 
	Magnitude of Poor Families_Est_2006
	Magnitude of Poor Families_Est_2009
	Magnitude of Poor Families_Est_2012
	Magnitude of Poor Families_Est_2015
In [ ]:
for column in magnitude_pf_est_by_year_column_names:
    print(column)
    print(f"{df[column].squeeze().str.strip().sort_values().head(3)}\n")
Magnitude of Poor Families_Est_2006
19    10,211
30    11,576
9     11,920
Name: Magnitude of Poor Families_Est_2006, dtype: object

Magnitude of Poor Families_Est_2009
52     10,701
68    100,860
13    103,487
Name: Magnitude of Poor Families_Est_2009, dtype: object

Magnitude of Poor Families_Est_2012
81    100,946
46    102,924
59    104,133
Name: Magnitude of Poor Families_Est_2012, dtype: object

Magnitude of Poor Families_Est_2015
14         -
45     1,594
75    10,570
Name: Magnitude of Poor Families_Est_2015, dtype: object

Replacing the comma with empty space and the dash with none type as to drop all of it with dropna() later on

In [ ]:
def clean_df_columns(dataframe: pd.DataFrame, columns_to_clean: list):
    for column in columns_to_clean:
        df[column] = (
            df[column]
            .str.replace(',','')
            .replace('-', None))

    return df

Cleaning and parsing magnitude_pf_est_by_year_column_names columns to int64

In [ ]:
df = (
    clean_df_columns(
        dataframe = df.convert_dtypes(), 
        columns_to_clean = magnitude_pf_est_by_year_column_names)
    .dropna()
    .astype(dict(zip(
        magnitude_pf_est_by_year_column_names,
        ['int64' for x in range(len(magnitude_pf_est_by_year_column_names))]))) )

Turning 'Region code' column to categorical

In [ ]:
region_codes_ordered = [region_code for region_code in df['Region code'].unique()]
region_codes_ordered.sort()

df['Region code'] = pd.Categorical(
    df['Region code'], categories = region_codes_ordered,
    ordered = True, )

Adding total magnitude column by region code

In [ ]:
for region_code in region_codes_ordered:
    for magnitude_pf_est_by_year_column_name in magnitude_pf_est_by_year_column_names:
        sum = df.loc[df['Region code'].isin([region_code]), [magnitude_pf_est_by_year_column_name]].sum().get(0)
        df.loc[df['Region code'].isin([region_code]), [f'Total {magnitude_pf_est_by_year_column_name}']] = sum
In [ ]:
total_magnitude_pf_est_by_year_column_names = [x for x in df.columns[df.columns.str.startswith('Total')]]
total_magnitude_pf_est_by_year_column_names
Out[ ]:
['Total Magnitude of Poor Families_Est_2006',
 'Total Magnitude of Poor Families_Est_2009',
 'Total Magnitude of Poor Families_Est_2012',
 'Total Magnitude of Poor Families_Est_2015']
In [ ]:
df = (
    df.astype(dict(zip(
        total_magnitude_pf_est_by_year_column_names,
        ["int64" for x in range(len(total_magnitude_pf_est_by_year_column_names))] ))))
In [ ]:
df[['Region', 'Region code'] + total_magnitude_pf_est_by_year_column_names].drop_duplicates().sort_values('Region code')
Out[ ]:
Region Region code Total Magnitude of Poor Families_Est_2006 Total Magnitude of Poor Families_Est_2009 Total Magnitude of Poor Families_Est_2012 Total Magnitude of Poor Families_Est_2015
10 REGION I (ILOCOS REGION) PH010000000 191326 172726 154712 112233
15 REGION II (CAGAYAN VALLEY) PH020000000 141954 143148 130154 95367
19 REGION III (CENTRAL LUZON) PH030000000 206568 232928 240079 223684
26 REGION IV-A (CALABARZON) PH040000000 189691 241158 256839 216461
36 REGION V (BICOL REGION) PH050000000 361802 385522 375974 346965
42 REGION VI (WESTERN VISAYAS) PH060000000 316668 353431 365041 281826
48 REGION VII (CENTRAL VISAYAS) PH070000000 411430 378221 405694 394336
52 REGION VIII (EASTERN VISAYAS) PH080000000 271319 293886 337220 299898
58 REGION IX (ZAMBOANGA PENINSULA) PH090000000 260618 280272 259750 214010
62 REGION X (NORTHERN MINDANAO) PH100000000 263981 298473 320114 311552
67 REGION XI (DAVAO REGION) PH110000000 229800 252151 268957 192449
71 REGION XII (SOCCSKSARGEN) PH120000000 250169 274042 366170 321287
0 NATIONAL CAPITAL REGION (NCR) PH130000000 73354 63974 76529 80246
4 CORDILLERA ADMINISTRATIVE REGION (CAR) PH140000000 66606 66111 65515 59760
80 AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM) PH150000000 205834 212494 271355 296998
76 REGION XIII (Caraga) PH160000000 191315 227453 169522 178159
31 MIMAROPA REGION PH170000000 176282 160227 150486 121283
In [ ]:
df.dtypes
Out[ ]:
Region                                         object
Region code                                  category
Province                                       object
Province code                                  object
Magnitude of Poor Families_Est_2006             int64
Magnitude of Poor Families_Est_2009             int64
Magnitude of Poor Families_Est_2012             int64
Magnitude of Poor Families_Est_2015             int64
Total Magnitude of Poor Families_Est_2006       int64
Total Magnitude of Poor Families_Est_2009       int64
Total Magnitude of Poor Families_Est_2012       int64
Total Magnitude of Poor Families_Est_2015       int64
dtype: object
In [ ]:
df.head()
Out[ ]:
Region Region code Province Province code Magnitude of Poor Families_Est_2006 Magnitude of Poor Families_Est_2009 Magnitude of Poor Families_Est_2012 Magnitude of Poor Families_Est_2015 Total Magnitude of Poor Families_Est_2006 Total Magnitude of Poor Families_Est_2009 Total Magnitude of Poor Families_Est_2012 Total Magnitude of Poor Families_Est_2015
0 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, CITY OF MANILA, FIRST DISTRICT (Not a Pro... PH133900000 9906 12405 14343 12710 73354 63974 76529 80246
1 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, SECOND DISTRICT (Not a Province) PH137400000 28247 21168 19782 21727 73354 63974 76529 80246
2 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, THIRD DISTRICT (Not a Province) PH137500000 18631 19306 18266 22352 73354 63974 76529 80246
3 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, FOURTH DISTRICT (Not a Province) PH137600000 16570 11095 24138 23457 73354 63974 76529 80246
4 CORDILLERA ADMINISTRATIVE REGION (CAR) PH140000000 ABRA PH140100000 18054 18852 13914 12400 66606 66111 65515 59760
In [ ]:
df.describe()
Out[ ]:
Magnitude of Poor Families_Est_2006 Magnitude of Poor Families_Est_2009 Magnitude of Poor Families_Est_2012 Magnitude of Poor Families_Est_2015 Total Magnitude of Poor Families_Est_2006 Total Magnitude of Poor Families_Est_2009 Total Magnitude of Poor Families_Est_2012 Total Magnitude of Poor Families_Est_2015
count 84.000000 84.000000 84.000000 84.000000 84.000000 84.000000 84.000000 84.000000
mean 45341.869048 48050.202381 50167.988095 44601.357143 225872.845238 240808.880952 252727.916667 225018.892857
std 37229.006994 38136.162504 40596.601751 38303.600739 87258.552190 93499.046159 104561.801013 98182.334678
min 3479.000000 3642.000000 3429.000000 1594.000000 66606.000000 63974.000000 65515.000000 59760.000000
25% 15409.000000 15604.750000 17907.500000 14672.750000 189691.000000 172726.000000 154712.000000 121283.000000
50% 39949.000000 43009.500000 39198.500000 32290.500000 206568.000000 241158.000000 259750.000000 223684.000000
75% 61509.250000 69217.500000 75694.000000 63847.250000 271319.000000 295032.750000 344175.250000 299898.000000
max 209301.000000 200481.000000 185603.000000 179162.000000 411430.000000 385522.000000 405694.000000 394336.000000

Loading geojson data¶

In [ ]:
geo_ph_regions = json.load(open(r'data/geojson/regions/regions.0.01.json'))

Execution¶

Peeking to geojson contents and looking for key to map with the dataframe

In [ ]:
geo_ph_regions['features']
In [ ]:
df.head(5)
Out[ ]:
Region Region code Province Province code Magnitude of Poor Families_Est_2006 Magnitude of Poor Families_Est_2009 Magnitude of Poor Families_Est_2012 Magnitude of Poor Families_Est_2015 Total Magnitude of Poor Families_Est_2006 Total Magnitude of Poor Families_Est_2009 Total Magnitude of Poor Families_Est_2012 Total Magnitude of Poor Families_Est_2015
0 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, CITY OF MANILA, FIRST DISTRICT (Not a Pro... PH133900000 9906 12405 14343 12710 73354 63974 76529 80246
1 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, SECOND DISTRICT (Not a Province) PH137400000 28247 21168 19782 21727 73354 63974 76529 80246
2 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, THIRD DISTRICT (Not a Province) PH137500000 18631 19306 18266 22352 73354 63974 76529 80246
3 NATIONAL CAPITAL REGION (NCR) PH130000000 NCR, FOURTH DISTRICT (Not a Province) PH137600000 16570 11095 24138 23457 73354 63974 76529 80246
4 CORDILLERA ADMINISTRATIVE REGION (CAR) PH140000000 ABRA PH140100000 18054 18852 13914 12400 66606 66111 65515 59760

Saving to local¶

In [ ]:
def df_export_to_csv():
    df.to_csv(r'data/csv/poverty_statistics_clnd.csv', index=False)
In [ ]:
def save_fig_binary(figure, file_name: str, folder_name: str, ) -> None:
    """Saves figure to binary format"""

    data_root_path = f'data/bin/{folder_name}'
    if not os.path.exists('data'):
        os.mkdir('data')
    if not os.path.exists('data/bin'):
        os.mkdir('data/bin')
    if not os.path.exists(data_root_path):
        os.mkdir(data_root_path)

    with open(f'{data_root_path}/{file_name}.bin', 'wb') as file:
        pickle.dump(figure, file)
In [ ]:
def load_fig_binary(file_name: str, folder_name: str):
    """Loads and returns binary figure"""

    data_root_path = f'data/bin/{folder_name}'
    if os.path.exists(f'{data_root_path}/{file_name}.bin'):
        file = open(f'{data_root_path}/{file_name}.bin', 'rb')
        figure = pickle.load(file)
        file.close()
        return figure
    else:
        return None
In [ ]:
df_export_to_csv()

Writing and reading figures¶

In [ ]:
year_by_column_name_total = dict(zip(['2006', '2009', '2012', '2015'], total_magnitude_pf_est_by_year_column_names))
year_by_column_name = dict(zip(['2006', '2009', '2012', '2015'], magnitude_pf_est_by_year_column_names))

Making a function to automate workflow in plotting by region with year as the argument

In [ ]:
def figure_update_layout(fig):
    fig.update_layout(
    coloraxis_colorbar = dict(title = 'Magnitude'), 
    margin={"r":0,"t":0,"l":0,"b":0}, 
    modebar_bgcolor = 'rgba(0,0,0,0)',
    modebar_color = '#6d0006',
    modebar_activecolor = '#323140',
    modebar_orientation = 'v')
In [ ]:
def plot_by_region(dataframe, year: str):
    file_name = f'by_region_{year}'
    folder_name = 'regions'

    fig = load_fig_binary(file_name = file_name, folder_name = folder_name)
    if fig is not None:
        return fig
    else:
        geojson_path = 'data/geojson'
        geojson_folder_name = 'regions'
        geojson_file_name = 'regions.0.01'
        geojson_file = json.load(open(rf'{geojson_path}/{geojson_folder_name}/{geojson_file_name}.json'))

        fig = px.choropleth(
            data_frame = dataframe,
            geojson = geojson_file,
            featureidkey = 'properties.ADM1_PCODE',
            locations = 'Region code',
            color = year_by_column_name[year],
            scope = 'asia',
            color_continuous_scale = px.colors.sequential.Reds, 
            custom_data = ['Region', 'Region code', year_by_column_name_total[year]])

        fig.update_traces(
            hovertemplate = '<br>'.join([
                'Region: %{customdata[0]}', 
                'Region code: %{customdata[1]}',
                '<b>Magnitude of poor families estimation: %{customdata[2]:,}</b>' ]),
        )

        fig.update_geos(fitbounds = 'locations', visible = False,)
        
        figure_update_layout(fig)
        
        save_fig_binary(figure = fig, file_name = file_name, folder_name = folder_name)
        return fig

Making a function to automate workflow in plotting by province with year as the argument

In [ ]:
region_name_by_region_code = dict(
    df[['Region', 'Region code']]
    .value_counts()
    .reset_index()
    .iloc[:, [0,1]]
    .sort_values('Region code')
    .values)

region_name_by_region_code
Out[ ]:
{'REGION I (ILOCOS REGION)': 'PH010000000',
 'REGION II (CAGAYAN VALLEY)': 'PH020000000',
 'REGION III (CENTRAL LUZON)': 'PH030000000',
 'REGION IV-A (CALABARZON)': 'PH040000000',
 'REGION V (BICOL REGION)': 'PH050000000',
 'REGION VI (WESTERN VISAYAS)': 'PH060000000',
 'REGION VII (CENTRAL VISAYAS)': 'PH070000000',
 'REGION VIII (EASTERN VISAYAS)': 'PH080000000',
 'REGION IX (ZAMBOANGA PENINSULA)': 'PH090000000',
 'REGION X (NORTHERN MINDANAO)': 'PH100000000',
 'REGION XI (DAVAO REGION)': 'PH110000000',
 'REGION XII (SOCCSKSARGEN)': 'PH120000000',
 'NATIONAL CAPITAL REGION (NCR)': 'PH130000000',
 'CORDILLERA ADMINISTRATIVE REGION (CAR)': 'PH140000000',
 'AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM)': 'PH150000000',
 'REGION XIII (Caraga)': 'PH160000000',
 'MIMAROPA REGION': 'PH170000000'}
In [ ]:
def plot_by_province(dataframe, year: str, region_name: str):
    region_code = region_name_by_region_code[region_name]
    file_name = f'by_region_{region_code}_{year}'
    folder_name = 'provinces'

    fig = load_fig_binary(file_name = file_name, folder_name = folder_name)
    if fig is not None:
        return fig
    else:
        dataframe = dataframe.loc[dataframe['Region'].isin([region_name])]

        geojson_path = 'data/geojson'
        geojson_folder_name = 'provinces'
        geojson_file_name = f'provinces-region-{region_code.lower()}.0.01'
        geojson_file = json.load(open(rf'{geojson_path}/{geojson_folder_name}/{geojson_file_name}.json'))

        fig = px.choropleth(
            data_frame = dataframe,
            geojson = geojson_file,
            featureidkey = 'properties.ADM2_PCODE',
            locations = 'Province code',
            color = year_by_column_name[year],
            scope = 'asia',
            color_continuous_scale = px.colors.sequential.Reds, 
            custom_data = ['Province', 'Province code', year_by_column_name[year]],
            basemap_visible=False, )

        fig.update_traces(
            hovertemplate = '<br>'.join([
                'Province: %{customdata[0]}', 
                'Province code: %{customdata[1]}',
                '<b>Magnitude of poor families estimation: %{customdata[2]:,}</b>' ]),
        )

        fig.update_geos(fitbounds = 'locations', visible = False,)
        
        figure_update_layout(fig)

        save_fig_binary(figure = fig, file_name = file_name, folder_name = folder_name)
        return fig

Visualizing with plotly¶

Magnitude of poor families estimation visualization by regions

In [ ]:
plot_by_region(dataframe = df, year ='2006')

Magnitude of poor families estimation visualization by province

In [ ]:
plot_by_province(dataframe = df, year = '2006', region_name = 'REGION IV-A (CALABARZON)')